Search CORE

124 research outputs found

Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce

Author: Langou Julien
Publication venue
Publication date: 01/01/2010
Field of study

A QR factorization of a tall and skinny matrix with n columns can be represented as a reduction. The operation used along the reduction tree has in input two n-by-n upper triangular matrices and in output an n-by-n upper triangular matrix which is defined as the R factor of the two input matrices stacked the one on top of the other. This operation is binary, associative, and commutative. We can therefore leverage the MPI library capabilities by using user-defined MPI operations and MPI_Reduce to perform this reduction. The resulting code is compact and portable. In this context, the user relies on the MPI library to select a reduction tree appropriate for the underlying architecture

arXiv.org e-Print Archive

CiteSeerX

The Problem with the Linpack Benchmark Matrix Generator

Author: Jack J. Dongarra
Jack J. Dongarra
Julien Langou
Julien Langou
Publication venue
Publication date: 01/01/2008
Field of study

We characterize the matrix sizes for which the Linpack Benchmark matrix generator constructs a matrix with identical columns

arXiv.org e-Print Archive

CiteSeerX

MIMS EPrints

Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Low-rank Matrix Approximations

Author: Gu Ming
Langou Julien
Xiao Jianwei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/04/2018
Field of study

Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven themselves empirically to be highly competitive with high-performance implementations of QR in processing time, on uniprocessor and shared memory machines, and as reliable as QRCP in pivot quality. We show that RQRCP algorithms can be as reliable as QRCP with failure probabilities exponentially decaying in oversampling size. We also analyze efficiency differences among different RQRCP algorithms. More importantly, we develop distributed memory implementations of RQRCP that are significantly better than QRCP implementations in ScaLAPACK. As a further development, we introduce the concept of and develop algorithms for computing spectrum-revealing QR factorizations for low-rank matrix approximations, and demonstrate their effectiveness against leading low-rank approximation methods in both theoretical and numerical reliability and efficiency.Comment: 11 pages, 14 figures, accepted by 2017 IEEE 24th International Conference on High Performance Computing (HiPC), awarded the best paper priz

arXiv.org e-Print Archive

Crossref

Algorithmic Based Fault Tolerance Applied to High Performance Computing

Author: Bosilca George
Delmas Remi
Dongarra Jack
Langou Julien
Publication venue
Publication date: 01/01/2008
Field of study

We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix-matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly

arXiv.org e-Print Archive

CiteSeerX

MIMS EPrints

The University of Manchester - Institutional Repository

Computing the Conditioning of the Components of a Linear Least Squares Solution

Author: Baboulin Marc
Dongarra Jack
Gratton Serge
Langou Julien
Publication venue
Publication date: 03/10/2007
Field of study

In this paper, we address the accuracy of the results for the overdetermined full rank linear least squares problem. We recall theoretical results obtained in Arioli, Baboulin and Gratton, SIMAX 29(2):413--433, 2007, on conditioning of the least squares solution and the components of the solution when the matrix perturbations are measured in Frobenius or spectral norms. Then we define computable estimates for these condition numbers and we interpret them in terms of statistical quantities. In particular, we show that, in the classical linear statistical model, the ratio of the variance of one component of the solution by the variance of the right-hand side is exactly the condition number of this solution component when perturbations on the right-hand side are considered. We also provide fragment codes using LAPACK routines to compute the variance-covariance matrix and the least squares conditioning and we give the corresponding computational cost. Finally we present a small historical numerical example that was used by Laplace in Theorie Analytique des Probabilites, 1820, for computing the mass of Jupiter and experiments from the space industry with real physical data

arXiv.org e-Print Archive

MIMS EPrints